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AMENDMENTS TO THE SPECIFICATION : 

The following amendments to the specification are requested. 

Please replace the paragraph starting at page 12, line 15, and extending to page 13, line 8, 
with the following paragraph, wherein only the arrangement of the equations with their 
numerical caption has been clarified: 

According to a preferred form of the invention, a non-parametric modeling approach is 
used that is uniquely capable of rendering estimates of variables of a complex system in 
operation, thus providing unique residuals and alerts between the actual values and the 
estimates. More preferably, a kernel-based non-parametric approach is used where a function, 
or "kernel", is used to combine learned observations in a weighted fashion based on the input 
observation to generate model results. The similarity-based approach is a kernel-based non- 
parametric model, capable of rendering useful estimates over a wide range of operation in 
contrast to parametric approaches like linear regression or neural networks, which tend to be 
only locally accurate. Kernel regression provides another kernel-based non-parametric 
estimator for use in the invention. Using a non-parametric model provides for purely data- 
driven modeling which avoids an investment in first-principles modeling and in tuning 
parametric estimators (such as neural networks), and provides for novel residual and alert 
precursors of failures for diagnostic purposes. A suitable kernel-based non-parametric model 
for use in the present invention is generally described by the equation: 

Y estimated = C • K(X in , D) (A) 
where estimated sensor readings Y es timated are determined from the results of the kernel function 
K operating on the input observation vector Xin and the set of learned observations in D, 
weighted according to some weight matrix C. In an alternative form, the kernel responses can 
be normalized to account for non-normalized data: 

I estimated — ^ * ^ (B) 
where M is some normalization factor. 
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Please replace the paragraph starting at page 13, line 9, and extending to page 14, line 19, 
with the following paragraph, wherein only the arrangement of the equations with their 
numerical caption has been clarified: 

According to the similarity operator-based empirical modeling technique, for a given set 
of contemporaneous sensor data from the monitored process or machine running in real-time, 
the estimates for the sensors can be generated according to: 



where the vector Y of estimated values for the sensors is equal to the contributions from each of 
the snapshots of contemporaneous sensor values arranged to comprise matrix D (the reference 



confused with weights C in equations A and B above). The multiplication operation is the 
standard matrix/vector multiplication operator. The vector Y has as many elements as there 
are sensors of interest in the monitored process or machine. W has as many elements as there 
are reference snapshots in D. W is determined by: 



Y estimated = D ■ W 



(1) 



library or reference set). These contributions are determined by weight vector W (not to be 



w = 



w 




(2) 




(3) 



or in terms of equation B: 
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estimated — 



d\d t ®d) 1 



Y estimated — [^]* 



(D ®Y in ) 



7=1 

K(Y in ,D) 
M 



(3A) 



(3B) 



where the T superscript denotes transpose of the matrix, and Ym is the current snapshot of 
actual, real-time sensor data. The similarity operator is symbolized in Equation 3, above, as the 
circle with the "X" disposed therein. Moreover, D is again the reference library as a matrix, and 
D T represents the standard transpose of that matrix (i.e., rows become columns). is the real- 
time or actual sensor values from the underlying system, and therefore is a vector snapshot. As 
mentioned above, the step of normalizing the W values in Equation 2 can be performed to 
improve modeling when the input data and training data have not been converted to 
normalized ranges. Furthermore, the similarity-based modeling approach can be used in an 
inferential mode, where estimates are made for variables which are not present as inputs, or the 
autoassociative case, where estimates are made for the inputs. In the inferential case, the D 
matrix can be separated into two parts, the first part of which corresponds to the inputs and is 
used in the kernel K, and the second part of which corresponds to the inferred variables and is 
in the numerator of C. 
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Please replace the paragraph starting at page 19, line 10, and extending to page 20, line 5, 
with the following paragraph, wherein missing equations 14, 14A and 14B are hereby inserted: 

Another example of a kernel-based non-parametric empirical modeling method that can 
be used in the present invention to generate estimates of the process or machine being 
monitored is kernel regression, or kernel smoothing. A kernel regression can be used to 
generate an estimate based on a current observation in much the same way as the similarity- 
based model, which can then be used to generate a residual as detailed elsewhere herein. 
Accordingly, the following Nadaraya-Watson estimator can be used: 

Y*.K k (X-X i )y t 

where in this case a single scalar inferred parameter y-hat is estimated as a sum of weighted 
exemplar y,- from training data, where the weight it determined by a kernel K of width h acting 
on the difference between the current observation X and the exemplar observations X, 
corresponding to the y; from training data. The independent variables X, can be scalars or 
vectors. Alternatively, the estimate can be a vector, instead of a scalar: 

Y estimated {X ', K) — — l — ~ ~ n 

i; =1 w-*,) (14> 

Here, the scalar kernel multiplies the vector Yi to yield the estimated vector. 
Put into terms of equation A above: 



Y estimated (X , H) — 



Yd 



Kh(X,X d) 



(14A) 



Y estimated (X,h) = [c]" ( X , X/))] (14B) 
where matrix Yd is the collection of learned output observations Y and matrix X D is the 
collection of learned input observations Xi. 
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Please replace the paragraph starting at page 20, line 6, and extending to line 14, with 
the following paragraph, wherein missing equations 15 and 16 are hereby inserted, and 
reference to Equation 6 is amended to refer to Equation 13: 



A wide variety of kernels are known in the art and may be used. One well-known 
kernel, by way of example, is the Epanechnikov kernel: 



where h is the bandwidth of the kernel, a tuning parameter, and u can be obtained from the 
difference between the current observation and the exemplar observations as in Equation 6 
Equation 13 . Another kernel of the countless kernels that can be used in remote monitoring 
according to the invention is the common Gaussian kernel (like the Gaussian kernel of the 
abovementioned radial basis function): 




\u\<h 



K h iu) = \ 



(15) 



0; 



\u\>h 



K h (X-X i ) = ^=e 



-{X-Xif 



2 




(16) 
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Please replace the paragraph starting at page 22, line 14, and extending to page 23, line 2, 



The basic approach of the SPRT technique is to analyze successive observations of a 
sampled parameter. A sequence of sampled differences between the estimate and the actual for 
a monitored parameter should be distributed according to some kind of distribution function 
around a mean of zero. Typically, this will be a Gaussian distribution, but it may be a different 
distribution, as for example a binomial distribution for a parameter that takes on only two 
discrete values (this can be common in telecommunications and networking machines and 
processes). Then, with each observation, a test statistic is calculated and compared to one or 
more decision limits or thresholds. The SPRT test statistic generally is the likelihood ratio l n , 
which is the ratio of the probability that a hypothesis Hi is true to the probability that a 
hypothesis Ho is true: 



where Y n are the individual observations and H n are the probability distributions for those 
hypotheses. This general SPRT test ratio can be compared to a decision threshold to reach a 
decision with any observation. For example, if the outcome is greater than 0.80, then decide Hi 
is the case, if less than 0.20 then decide Ho is the case, and if in between then make no decision. 



with the following paragraph, wherein missing equation 5 is hereby inserted: 




(5) 
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Please replace the paragraph starting at page 23, line 15, and extending to page 25, line 
15, with the following paragraph, wherein missing equations 6, 7, 8, 9, 10, 11 and 12 are hereby 
inserted: 



For residuals derived from known normal operation, the mean is zero, and the variance 
can be determined. Then in run-time monitoring mode, for the mean SPRT test, the likelihood 
that Ho is true (mean is zero and variance is a 2 ) is given by: 



L(y l ,y 2 ,...,y n \H 0 )=j^ i 



2cr t=1 



(6) 



and similarly, for Hi, where the mean is M (typically one standard deviation below or above 
zero, using the variance determined for the residuals from normal operation) and the variance 
is again a 2 (variance is assumed the same): 



L(y x ,y 2 ,...,y n \H x )=- 



1 



la U=l k=\ k=\ 



W2 



{iTtcrf 

The ratio l n from Equations 6 and 7 then becomes: 



L=e l 



(7) 



(8) 



A SPRT statistic can be defined for the mean test to be the exponent in Equation 8: 



SPRT mean =-^±M{M-2y t ) 



(9) 



The SPRT test is advantageous because a user-selectable false alarm probability a and a missed 
alarm probability P can provide thresholds against with SPRT mea n can be tested to produce a 
decision: 

1. If SPRTmean < ln(p/(l-ct)), then accept hypothesis Ho as true; 

2. If SPRTmean > ln((l-p)/ot), then accept hypothesis HI as true; and 

3. If ln(P/(l-a)) < SPRTmean < ln((l-P)/a), then make no decision and continue sampling. 
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For the variance SPRT test, the problem is to decide between two hypotheses: H 2 where the 
residual forms a Gaussian probability density function with a mean of zero and a variance of 
Vct 2 ; and Ho where the residual forms a Gaussian probability density function with a mean of 
zero and a variance of a 2 . The likelihood that H 2 is true is given by: 

I'M" 2 

,1/2_V'2 6 (10) 



The ratio l n is then provided for the variance SPRT test as the ratio of Equation 10 over 
Equation 6, to provide: 

an 



l n =V~ in e la 

and the SPRT statistic for the variance test is then: 



SPRT„-„ np — - 



variance n, 2 



(12) 



2o l 

Thereafter, the above tests (1) through (3) can be applied as above: 

1 . If SPRT V ariance < ln(P / (1-cx)), then accept hypothesis H 0 as true; 

2. If SPRTvanance > ln((l-p)/ct), then accept hypothesis H 2 as true; and 

3. If ln(p/(l-oc)) < SPRTvanance < ln((l-P)/a), then make no decision and continue sampling. 
Each snapshot that is passed to the SPRT test module, can have SPRT test decisions for positive 
mean, negative mean, and variance for each parameter in the snapshot. In an empirical model- 
based monitoring system according to the present invention, any such SPRT test on any such 
parameter that results in an hypothesis other than Ho being accepted as true, is effectively an 
alert on that parameter. Of course, it lies within the scope of the invention for logic to be 
inserted between the SPRT tests and the output alerts, such that a combination of a non-H 0 
result is required for both the mean and variance SPRT tests in order for the alert to be 
generated for the parameter, or some other such rule. 
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